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Abstract 


This paper explores the frontiers of large language models (LLMs) in psychology applications. Psychology 
has undergone several theoretical changes, and the current use of Artificial Intelligence (AI) and Machine 
Learning, particularly LLMs, promises to open up new research directions. We provide a detailed exploration 
of how LLMs like ChatGPT are transforming psychological research. It discusses the impact of LLMs across 
various branches of psychology, including cognitive and behavioral, clinical and counseling, educational and 
developmental, and social and cultural psychology, highlighting their potential to simulate aspects of human 
cognition and behavior. The paper delves into the capabilities of these models to emulate human-like text 
generation, offering innovative tools for literature review, hypothesis generation, experimental design, 
experimental subjects, data analysis, academic writing, and peer review in psychology. While LLMs are 
essential in advancing research methodologies in psychology, the paper also cautions about their technical 
and ethical challenges. There are issues like data privacy, the ethical implications of using LLMs in 
psychological research, and the need for a deeper understanding of these models' limitations. Researchers 
should responsibly use LLMs in psychological studies, adhering to ethical standards and considering the 
potential consequences of deploying these technologies in sensitive areas. Overall, the article provides a 
comprehensive overview of the current state of LLMs in psychology, exploring potential benefits and 
challenges. It serves as a call to action for researchers to leverage LLMs' advantages responsibly while 
addressing associated risks. 
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Introduction 

Artificial intelligence (AI) has a history spanning nearly seven decades, beginning with the 1956 
Dartmouth Conference. The field has recently been revolutionized by the advent of large language models 
(LLMs), such as OpenAI's ChatGPT series, Google's Bard, and Meta's Llama. Notably, OpenAI's GPT-4, in 


particular, may signify a paradigm shift by demonstrating impressive capabilities (e.g., it solved difficult 


tasks in math, coding, vision, medicine, law, psychology, etc.) (Bubeck et al., 2023), that is to say, AI for 
Science (H. Wang et al., 2023). LLMs mark a critical juncture in the evolution of machine learning and AI, 
propelled by their expansive size and sophisticated neural architectures that incorporate attentional 
mechanisms (Vaswani et al., 2017). Due to the integration of cognitive mechanisms (Binz & Schulz, 2023a), 
these models have acquired the ability to exhibit emergent behaviors akin to complex physical systems (Wei 
et al., 2022), which has not only enhanced their ability to understand concepts and high-level semantics (J. 
Li et al., 2022) but also deepened our insights into cognitive processes (Sejnowsk1i, 2022). In psychological 
applications, these developments are reshaping interactions and comprehension of data, language, and our 
environment (De Bot et al., 2007; Demszky et al., 2023), contributing significantly to various fields, 
including clinical (Thirunavukarasu et al., 2023), development (Frank, 2023; Hagendorff, 2023), and social 
psychology (Demszky et al., 2023; Hardy et al., 2023; Zhang et al., 2023). Moreover, they profoundly impact 


psychological research methodologies, offering novel approaches and tools for exploration and analysis. 


1.1. The concept of LLMs: From machine learning to capability emergence 


Machine learning, particularly natural language processing (NLP), has significantly progressed in the 
last decade. However, the emergence of LLMs such as GPT-4 and its predecessors marks a significant leap 
in AI capabilities. LLMs are deep learning models designed to process natural language text and generate 
human-like responses or texts. Their capability emergence is defined as a qualitative change in behavior due 
to a quantitative change in the system and thus a qualitative change in behavior, i.e., a capability is emergent 
if it does not exist in a smaller model and exists in a larger model (Wei et al., 2022). 

At the heart of this LLM is the transformer architecture, a deep neural network with an attentional 
mechanism to efficiently process sequential data in parallel (Vaswani et al., 2017), which works somewhat 
similarly to the human brain functions. This architecture has revolutionized the field of Natural Language 
Processing (NLP). The self-attention mechanism of the transformer architecture captures contextual 
relationships in textual data, allowing for more sophisticated language understanding. On the one hand, the 
"large" in LLMs refers to many parameters and massive amounts of training data used to fine-tune these 
models, typically billions of parameters and terabytes of text (Binz & Schulz, 2023b), and on the other, 
means master the world model (Yildirim & Paul, 2023). 

The process of large language modeling, from machine learning to the emergence of competence, can be 


divided into several key stages (Demszky et al., 2023). First, pre-training: LLMs are pre-trained on large 


amounts of textual data to learn intricate linguistic, syntactic, and textual structures (P. Liu et al., 2023). This 
unsupervised learning phase lays the foundation for the big language model to understand the language. 
Second, fine-tuning: LLM can be fine-tuned for a specific task or domain after pre-training to make it 
adaptable and suitable for various applications (Liu et al., 2022). This fine-tuning process ensures that the 
model can generate contextually relevant responses and engage in meaningful conversations or tasks. Third, 
language comprehension: LLMs have demonstrated a remarkable ability to understand and develop human- 
like text. They can answer questions, write articles, summarize content, translate language, and even do 
creative writing (Bubeck et al., 2023). Their skillful understanding of context is an essential factor 
contributing to the emergence of their intelligence. Fourth is the emergence of capabilities: LLMs exhibit 
"capability emergence" when integrated into various applications and systems. They can perform tasks that 
require a deep understanding of language and context, often achieving human-like or superhuman 
performance in specific domains (OpenAI, 2023), such as analogical reasoning (Webb et al., 2023), creativity 
(Stevenson et al., 2022), and emotion recognition (Patel & Fan, 2023). 

Therefore, LLMs offer intriguing insights into how these technologies can mimic or augment human 
cognitive processes. For instance, LLMs' ability to understand and generate natural language echoes aspects 
of human linguistic and cognitive skills (Goertzel, 2023). This parallel allows for exploration into AI 
applications in the cognitive psychology (Sartori & Orru, 2023), language acquisition (Jungherr, 2023), and 
even the mental health (Lamichhane, 2023). Moreover, the study of LLMs contributes to our understanding 
of the human mind, offering a computational perspective on language processing, the decision-making (Sha 
et al., 2023), and learning mechanisms (Hendel et al., 2023). Fusing these disciplines advances AI's 


capabilities and deepens our comprehension of the human mind. 


1.2. Psychology and AI 


Psychology, as a science that explores the human mind and behavior, has undergone significant 
theoretical changes since the late 19th century, with psychoanalysis and behaviorism extending to cognitive 
psychology (Hothersall & Lovett, 2022). This history not only marks a shift in the focus of research in 
psychology but also reflects the academic trend from observing behavioral manifestations to exploring in- 
depth psychological connotations. Each of these phases has led to a deepening understanding of the psycho- 
cognitive processes of human beings. 


Understanding human psycho-cognitive processes is therefore crucial to psychology. In clinical and 


counseling psychology, research in cognitive psychology supports diagnosing and treating psychological 
disorders. It deepens understanding of the psychological mechanisms underlying emotions, stress, and 
human behavior. Psychotherapies, such as cognitive-behavioral therapy (Hofmann et al., 2012) and 
psychodynamic therapy, have become essential tools for promoting mental health and emotional regulation. 
In educational and developmental psychology, the development of cognitive psychology has fostered a 
deeper understanding of the role of perceptual and affective factors in the learning process (Glaser, 1984), 
which has led to innovations in teaching methods and learning strategies. In social and cultural psychology, 
cognitive psychology research helps explain individuals’ behavior and mental processes in different social 
and cultural contexts. It explores how cultural differences affect individuals’ cognitive patterns, values, and 
behavioral norms, especially in globalization, interaction, and integration. Meanwhile, in social psychology, 
cognitive psychology's research on group behavior, social influence, prejudice, and discrimination is of great 
value in promoting social harmony and mutual understanding (Park & Judd, 2005). 

Al is a growing force in psycho-cognitive research. For example, Simon (1979) recognized the potential 
of computational models to simulate human cognitive processes early on. The recent emergence of LLMs, 
represented by OpenAI's GPT family (mainly including GPT-3, ChatGPT, and GPT-4), which can process 
and generate human-like texts and perform close to humans in some cognitive tasks (Bubeck et al., 2023). 
More so, it provides a unique perspective to study human cognition. For example, GPT-3 can solve vignette- 
based tasks similarly or better than human subjects and can make rational decisions based on descriptions, 
outperforming humans in the multi-armed bandit task (Binz & Schulz, 2023b). Furthermore, after extensive 
testing, GPT-3 can solve complex analogical problems at levels comparable to human performance, and 
analogical reasoning is an essential hallmark of human intelligence (Webb et al., 2023). Furthermore, fine- 
tuning across multiple tasks could allow LLMs to predict human behavior in previously unseen tasks, i.e., 
LLMs could be adapted to general-purpose cognitive models (Binz & Schulz, 2023a), potentially opening 
up new research directions that could transform cognitive psychology and behavioral science in general. 

Newell's time-scale theory provides a multidimensional framework for understanding human behavior 
Newell (1990). In his seminal work, Newell (1990) articulates a nuanced framework for comprehending 
cognitive processes, stratifying human behavior across four distinct temporal levels (see Fig. la). At the 
foundational biological level, he addresses core biological and physiological processes characterized by 
rapid time scales fluctuating from approximately one millisecond to one second. This level might include 


neural responses and sensory processing, foundational to human cognition. Advancing to the cognitive layer, 


Newell examines fundamental cognitive mechanisms operating on intermediate time scales, typically from 
one second to around one minute. This layer could encompass basic cognitive operations such as attention, 
perception, and short-term memory. Further, at the rational layer, the focus shifts to more elaborate and 
sustained cognitive activities. These processes, often extending from several minutes to a few hours, might 
involve complex problem-solving, planning, and decision-making activities requiring a higher level of 
cognitive engagement. Finally, the social layer encapsulates human behavior within social interactions and 
structures. This level, characterized by the broadest time scales ranging from a few hours to days or more, 
delves into the dynamics of social communication, group behavior, and cultural influences on cognition. 
This layered approach underscores the multifaceted nature of human behavior, highlighting the interplay 
between rapid physiological processes and the more prolonged, socially influenced aspects of human 
cognition. 

LLMs have great potential for modeling cognition and behavior on these different time scales (see Fig. 
1b) and can provide new insights into human psycho-cognitive processes. Recent research has revealed 
significant advancements in LLMs’ ability to emulate complex human cognitive and social behaviors 
(Grossmann et al., 2023; Marjich et al., 2023; Orru et al., 2023; Pal et al., 2023; Stevenson et al., 2022; Webb 
et al., 2023). Grossmann et al. (2023) and Marjieh et al. (2023) have shown LLMs’ proficiency in simulating 
human social interactions and perceptual processing, respectively. Orru et al. (2023) and (Webb et al., 2023) 
highlighted LLMs' capabilities in complex problem-solving and reasoning, while Hagendorff et al. (2023) 
focused on decision-making processes. Stevenson et al. (2022) documented LLMs' potential for creativity, 
and Patel and Fan (2023) demonstrated their ability in emotion recognition. These findings collectively 
indicate the expanding role of LLMs in mimicking and enhancing human cognitive and social functions, 
marking significant progress in AI research. 

As a general cognitive model (Binz & Schulz, 2023a), LLMs can provide new perspectives and 
approaches to research in the fields of cognitive and behavioral psychology, clinical and counseling 
psychology, educational and developmental psychology, and social and cultural psychology in different time 


scales of human behavior (see Fig. la). 
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Fig.1 LLMs’ emergent abilities can be applied in psychological domains and work as research tools: (a) LLMs’ application in psychological 
domains from time scales of human behavior. (b) LLMs’ emergent abilities. (c) LLMs work as research tools. 

Based on these emergent abilities, LLMs can also be used as a research aid (see Fig. lc) to help 
psychologists with everything from literature review (Aydin & Karaarslan, 2022; Qureshi et al., 2023), 
experimental subjects (Dillion et al., 2023; Hutson, 2023), and data analysis (Patel & Fan, 2023; Peters & 
Matz, 2023; Rathje et al., 2023) to academic writing (Dergaa et al., 2023; Stokel-Walker, 2022) and peer 
review (Chiang & Lee, 2023; Van Dis et al., 2023). Thus, LLMs can potentially become research assistants 


for psychologists, helping them improve their research efficiency. 


1.3. Objectives and significance of the present review 


This review systematically examines the use of LLMs in various psychological domains, analyzing their 
applications over different behavioral time scales. The exploration begins with LLMs in cognitive and 
behavioral psychology (Section 2), followed by their roles in clinical and counseling psychology (Section 
3). The review then transitions to educational and developmental psychology (Section 4) and social and 
cultural psychology (Section 5), outlining LLMs' contributions in each area. To gain a deeper understanding 
of the impact of LLMs on psychological research, Section 6 will provide an overview of their potential as a 
tool for scientific research. The review also addresses challenges and future research directions in applying 
LLMs to psychological contexts. It concludes by summarizing their applications in psychology and offering 


recommendations for future work. Crucially, this review proposes integration strategies for LLMs in 


psychological research and provides insights into interpreting these models from a psychological standpoint, 


contributing to their safety and interpretability. 


LLMs in cognitive and behavioral psychology 


Within multilevel time scales of human behavior (Newell, 1990), cognitive and behavioral psychology 
has focused primarily on the study of cognitive processes on sub-hourly time scales (see Fig. 1), which 
encompass humans engaging in perception, memory, thinking, decision-making, problem-solving, and 
conscious planning. Cognitive and behavioral psychology typically employs experimental methods to study 
these cognitive processes by controlling and observing behaviors and responses under specific conditions. 
The recent emergence of LLMs has reinvigorated the discussion as to whether human cognitive abilities are 
revealed in these LLMs given sufficient training data. If the answer is yes, then it would be possible to study 
the cognitive processes of LLMs, thereby gaining knowledge of human cognitive processes and forming a 
valuable addition to existing research methods in cognitive psychology. 

Binz and Schulz (2023a) found that fine-tuning multiple tasks enabled the LLM to predict human 
behavior in previously unseen tasks, suggesting that the LLM can be adapted to become a generalist 
cognitive model. In another study, they tested the GPT-3 with tools from cognitive psychology and showed 
that it made better decisions and outperformed humans in a multiarmed bandit task (Binz & Schulz, 2023b). 
In addition, there are other series of studies that have demonstrated that LLMs have perceptual judgment 
(Marjiech et al., 2023), reasoning (Webb et al., 2023), and decision-making abilities (Hagendorff et al., 2023), 
creativity (Stevenson et al., 2022), and problem-solving (Orru et al., 2023), and one study even demonstrated 
that the GPT-4 has the mental abilities of a seven-year-old child through a false belief task (considered the 
gold standard for testing theory of mind in humans) (Kosinski, 2023). For example, Hagendorff et al. (2023) 
explored reasoning capabilities and decision-making processes of the OpenAI GPT family by the following 
experimental method: Design a series of semantic illusion and cognitive reflection tests designed to elicit 
intuitive but erroneous responses. Apply these tasks, traditionally used to study human reasoning and 
decision-making, to OpenAI's family of generative pre-trained Transformer models. Analyze model 
performance on a Cognitive Reflection Test (CRT) task and a semantic illusions task to reveal their System 
1 and System 2 thought processes. Observe how ChatGPT models show correct responses in these tasks and 


avoid pitfalls. Evaluate the performance of the models in the CRT task by preventing them from chain- 


thinking to reason. The results show that as model size and language capability increase, the OpenAI family 
of generative pre-trained Transformer models increasingly exhibit human-like intuitive system 1 thinking 
and associated cognitive errors. Table 1 summarizes the applications of LLMs to cognitive and behavioral 
psychology. 

These research cases demonstrate that LLMs have human-like cognitive abilities (Zhuang et al., 2023). 
Studying the cognitive mechanisms of LLMs would provide new insights into human cognitive processes. 


They will provide promising avenues for advancing psychological research methodologies and 


understanding complex cognitive phenomena as they evolve. 


Author 
Sartori and 
Orr (2023) 


Hagendorff et 
al. (2023) 


Hutson 
(2023) 


Dillion et al. 
(2023) 


Zhuang et al. 
(2023) 


Grossmann et 
al. (2023) 


Loconte et al. 
(2023) 


Binz and 
Schulz 


(2023a) 


Orru et al. 
(2023) 


Hagendorff 
(2023) 


Table 1 Applications of LLMs in cognitive and behavioral psychology study. 


Research question 


The human-like properties 
LLMs exhibit in a variety of 
cognitive tasks. 


Reasoning capabilities and 
decision-making processes 
of the OpenAI GPT family 
bear any resemblance to 
human system 1 and system 
2 thought processes. 


Can AI language models be 
used to replace human 
participants in 
experiments? 


Explore whether LLMs can 
replace human participants 
in the psychological 
sciences. 


How to efficiently measure 
the cognitive abilities of 
LLMs. 


How to improve social 
science research methods in 
the context of the ongoing 
impact of LLMs on social 
science research. 


Neuropsychological 
evaluations of the 
performance of a LLM in 
terms of prefrontal 
functioning. 


How to better describe 


human decision-making 
behavior by fine-tuning 
LLMs. 


The potential of ChatGPT 
as an intelligent tool for 
problem-solving. 


How to use psychological 
methods to study the 
emergent abilities and 


Research method 


Decision-making, information search, 
deliberation, causal reasoning, Wason 
Selection Task, and Raven-like matrices. 


Analyze model performance on a Cognitive 
Reflection Test (CRT) task and a semantic 
illusions task to reveal their System 1 and 2 
thought processes. 


LLMs(e.g., GPT-3.5) were used to conduct 
the experiment instead of human 
participants. 


Making human-like moral judgments was 
assessed by analyzing the similarity of GPT- 
3.5's judgments to humans. 


A Computerized Adaptive Testing (CAT) for 
assessing cognitive ability in LLMs. 


LLMs replace human participants in data 
collection, and act as "peers" in social 
interaction studies. 


The Verbal Reasoning Test, Cognitive 
Estimation task, Metaphor, and Idioms 
Comprehension test, Winograd Schema, 
Tower of London, Hayling Sentence 
Completion Test, (Compound Remote 
Associate problems, and Social Cognition. 


By comparing the goodness-of-fit of 
different models: random guessing model, 
domain-specific model, LLaMA 
unfinetuned, and fine-tuned model. 


Verbal insight problems were administered to 
ChatGPT: the first set was referred to as 
"practice problems," while the second set 
was referred to as "transfer problems". 


Studying the behavioral patterns, emergent 
abilities, and decision-making mechanisms 
of LLMs by treating them as participants in a 


Key finding 


LLMs have demonstrated human-like 
performance in cognitive psychology. 


ChatGPT-3.5 and 4 use input-output 
context windows during chain-thinking 
reasoning, similar to how humans use 
laptop-support system 2 thinking. 


LLMs can replace human participants in 
experimental research in some cases. 


LLMs can be used as a substitute for 
human participants in some cases. 


ChatGPT has surpassed the 
programming abilities of high-ability 
college students in dynamic 
programming and search. 


LLMs have great potential for use in 
social science research because of their 
ability to model human behavior and 
generate diverse responses. 


ChatGPT exhibits disjointed cognitive 
profiles in prefrontal functioning (e.g., 
some performing better than average and 
others performing at pathological levels). 


LLMs using fine-tuning (e.g., LLaMA) 
can successfully capture human 
decision-making behavior and perform 
better than domain-specific models. 


ChatGPT's global performance in the 
practice and transfer problems was 
identical to the most likely results in the 
human sample. 


Uncovering emergent abilities in LLMs 
that cannot be detected by traditional 
natural language processing 


Dhingra et al. 
(2023) 


Shiffrin 
Mitchell 
(2023) 


and 


Binz and 
Schulz 


(2023b) 


Marjieh et al. 


(2023) 
Huang and 
Chang (2022) 


Hagendorff et 
al. (2022) 


Stevenson et 
al. (2022) 


behaviors of LLMs. 

The performance of the 
GPT-4 on a cognitive 
psychology task to 
understand how it 
processes. 


Decision-making 
mechanisms and other 
psychological qualities of 
LLMs. 


Assessing the GPT-3's 


cognitive ability. 


How LLMs predict human 
sensory judgments across 
multiple sensory 
modalities. 


How to improve and direct 


the reasoning of these 
models. 
The machine intuitive 


capabilities of human-like 
intuitive decision-making 
in GPT-3.5. 


Evaluating the performance 
of Open Al's generative 
natural language model 
GPT-3 on creativity. 


psychological experiment. 


Evaluating the performance of the GPT-4 on 
a range of cognitive psychology datasets: 
CommonsenseQA, SuperGLUE, MATH, and 
HANS. 


Two sets of experiments were conducted, 
using situational prompts from the 
psychological literature and prompts not in 
the GPT-3 training corpus. 


Vignette-Based Investigations. Decision- 
Making. Information Search. Deliberation. 
Causal Reasoning. 


Using LLMs such as GPT-3, GPT-3.5 and 
GPT-4 to predict human judgments in six 
sensory modalities (pitch, loudness, color, 
sound, taste, and timbre). 


Methods for assessing the reasoning ability 
of LLMs: fully supervised fine-tuning, 
cueing and contextual learning, problem 
decomposition, mixed-methods. 


Conducting Cognitive Reflection Test (CRT) 
and Semantic Illusion Test on GPT-3.5. 


Comparing the GPT-3 to the Alternative Uses 
Test (AUT), which is widely used by humans 
in creativity research. 


benchmarks. 


The GPT-4 has revolutionary potential to 
help machines bridge the gap between 
human and machine reasoning. 


GPT-3 outperforms humans on some 
tasks and performs poorly on others. 


GPT-3 showed surprising abilities in 
decision-making, information search, 
and thoughtfulness, but performed 
poorly in causal reasoning. 


LLMs can successfully predict human 
perceptual judgments in six modalities. 


Improving the reasoning ability of LLMs 
requires the use of training data, model 
architectures, and optimization goals 
specific to reasoning. 


GPT-3.5 systematically exhibits 
"machine intuition", i.e., produces 
human-like erroneous decisions on the 
CRT as well as semantic illusions. 


On the creativity test, humans currently 
outperformed the GPT-3 in originality 
and unexpectedness, but the GPT-3 
performed better in utility. 


3. LLMs in clinical and counseling psychology 


In multilevel time scales of human behavior (Newell, 1990), clinical and counseling psychology would 
involve the assessment of everyday behavioral acts (about a few hours to a day), habitual thinking (about a 
day to a few months), and psychological disorders, among others (a few months to many years) (see Fig.1). 
Clinical and counseling psychology focuses on assessing, diagnosing, treating, and preventing individual 
mental health problems. These processes often involve medium- to long-term periods. According to related 
reports, there is a public rush to use LLMs such as the ChatGPT for mental health screening and treatment 
(Demszky et al., 2023). LLMs are expected to be used in clinical and counseling because they can parse 
human language and generate human-like responses, categorize text, and flexibly adapt conversational styles 
representing different theoretical orientations (Stade et al., 2023). So, how do LLMs work in psychotherapy, 
and can they replace the role of the human psychotherapist? 

LLMs are a basic generalized model with the ability to learn from small samples (Brown et al., 2020), a 
capability that allows them to quickly become experts in the clinical and counseling domain with only a 
small amount of data to learn from. For example, LLMs trained on clinical content can identify more specific 


factors of change that can help psychologists understand the process of clinical interventions, thus opening 


the black box of psychotherapy (Schueller & Morris, 2023). Additionally, related studies have shown that 
LLMs can correctly recognize emotions and respond to (Patel & Fan, 2023; Schaaff et al., 2023) 
appropriately and that human-computer collaborations in clinical psychological support result in more 
empathy (Sharma et al., 2023). Also, LLMs can accomplish mental health assessments (Elyoseph & 
Levkovich, 2023; Kjell et al., 2023) and individualized interventions (Blyler & Seligman, 2023a, 2023b). 
Blyler and Seligman (2023a) proposed an individualized intervention: Participants were recruited from 
previous studies and were 18 or older. From the five narrative identities generated by ChatGPT-4 that were 
rated as "completely accurate," five participants representing different backgrounds and experiences were 
selected. The participants ' narratives were provided through a dialog with the ChatGPT-4, and the AI was 
asked how it would guide life coaching based on the narrative identities. Based on the Al-generated narrative 
identities and recommended coaching methods, ChatGPT-4 was asked how to recommend specific 
interventions. The results suggest that the coaching strategies and interventions generated by ChatGPT-4 
make perfect sense based on narrative identity. Table 2 summarizes the applications of LLMs to clinical and 
counseling psychology. 

These research cases, which demonstrate the ability of LLMs to provide clinicians with adequate mental 
health support (Schueller & Morris, 2023), hold promise to address the lack of capacity in the mental health 
care system as it continues to evolve and may provide more individualized treatment services and even have 
the potential to fully automate psychotherapy in the future (Stade et al., 2023). Of course, it is essential in 
this process to ensure that LLM is safe and privacy-protective in psychotherapy. 


Table 2 Applications of LLMs in clinical and counseling psychology study. 


Author 


Carlbring et 


Research question 


Can AI be utilized to 


Research method 


Internet intervention methods: real-time video therapy, 


Key finding 


AI can work with therapists to 


al. (2023) improve the digital self-help programs, combining Internet improve outcomes. 
effectiveness of interventions with face-to-face therapy, etc. Potential 
Internet interventions? applications of AI in Internet interventions: virtual 
psychological coaches, psychotherapists, etc. 
(Blyler & How a person's ChatGPT-4 generates personalized narrative identities  ChatGPT-4 generates highly 
Seligman, narrative identity can based on stream-of-consciousness thoughts and credible coaching strategies and 
be used to provide demographic information. Then, it provided targeted highly credible specific 
2023a) coaches and therapists coaching methods and interventions based on interventions based on the 
with a personalized narratives and analyzed the feasibility of the coaching narrative identities it constructs. 
approach to strategies and specific interventions. 
intervention. 
Blyler and The potential of AI in Stream-of-consciousness reflections and basic Alcan support self-discovery in 
Seligman psychological practice, demographic information were processed through the psychotherapy and coaching. 
(20236) Al-generated personal ChatGPT-4 to generate personal narratives and then 


narratives in therapy 
and counseling to 


promote self- 


evaluated these Al-generated narratives for accuracy, 
level of surprise, and illuminating. 


Abd-Alrazaq 
et al. (2019) 


Elyoseph and 
Levkovich 
(2023) 


Kjell et al. 
(2023) 


Pal et al. 
(2023) 


(J. M. Liu et 
al., 2023) 


Schueller and 


Morris (2023) 


Sharma et al. 


(2023) 


Graber-Stiehl 
(2023) 


Stade et al. 
(2023) 


Zhong et al. 
(2023) 


discovery. 


How the features and 
applications of 
chatbots are meeting 
the needs of people in 
the field of mental 
health. 


This paper seeks to 
address the potential 
and limitations of the 
AI language model 
ChatGPT for suicide 
risk assessment. 


This paper seeks to 
explore how the use of 
LLMs 
psychological 
assessment. 


can change 


Bias in LLMs age and 
gender dimensions. 


How to provide 
effective counseling 
services in the field of 
mental health support 
through the use of 
LLM. 


The use of LLMs in the 
field of clinical and 
counseling psychology 


and how they can 
provide better 
psychological 
interventions. 


How AI systems can be 
utilized to assist peer 


supporters in 
improving empathy 
expression. 

Are AI treatments 
ready for mainstream 
adoption in today's 
world? 

How to responsibly 


develop and evaluate 
LLMs while realizing 
their potential value in 
behavioral health. 


Challenges of using 
LLMs in psychiatric 
research and practice. 


Chatbots in mental health are characterized by a wide 
range of applications, a variety of interactions, rule- 
driven or machine-learning generation of responses, 
the use of virtual representatives, and stand-alone 
software or web-based platform implementations. 


Using a hypothetical case study, ChatGPT was asked 
about mental health indicators in a hypothetical patient 
with varying degrees of feelings of self-burden and 
frustrated belonging. ChatGPT's assessments were 
then compared to those of mental health professionals. 


Analyze natural language responses using LLMs to 
extract mental health-related information and the 
strengths (e.g., accuracy, scope, parsing, and openness) 
and limitations (e.g., bias, risk, and ethical issues) of 
LLMS for assessing mental health. 


Datasets: i2b2 2006 smoking and i2b2 2008 obesity 
datasets. Classification task: multi-label classification 
of sub-cases. Model selection: BERT-based language 
models. Training and optimization: trained for 1000 
epochs and optimized using Adam optimizer. 
Evaluation metrics: Micro Fl-score. Bias analysis: 
age, gender, and cross subgroups. 


Adapting instructions using the GPT-4 to generate 
question-answer pairs based on collecting recordings 
of real counseling sessions. Introducing the 
Counseling Bench assessment framework to evaluate 
the effectiveness. Automated evaluation using GPT-4 
to compare the performance of ChatCounselor with 
other LLMs. 


Support therapists to improve their skills. Support lay 
people to deliver effective, evidence-based practice. 
Developing novel digital psychotherapy interventions 
(DMHIs). Analyze therapist discourse to identify 
factors that predict symptom improvement. Identifying 
more specific change factors and factors associated 
with different types of treatment. 


Development of an intelligent feedback system called 
HAILEY (Human-AI collaboration approach for 
EmpathY) to evaluate the role of AI in improving 
empathic responses of peer supporters. 


Koko app users were given the option to get more 
complete advice from Kokobot (an assistant based on 
GPT-3), and users could edit or adapt the response to 
their needs and send it. 


Focus on evidence-based practice. Commitment to 
rigorous but pragmatic evaluation. Involves 
interdisciplinary collaboration. Focuses on trust and 
availability for therapists and patients. Designs 
effective clinical LLM standards. 


Improve diagnostic accuracy by analyzing large 
datasets of patient information and identifying patterns 
that are difficult for humans to detect. Identify 
individual patient characteristics that predict response 
to treatment and suggest treatment plans tailored to 
each patient. 


Chatbots focus primarily on 
depression and autism and most 
implementations are in 
developed countries. 


ChatGPT underestimated the 
risk of suicide attempts in all 
scenarios, compared to mental 
health professionals. 


LLMs have the potential to 
transform psychological 
assessments away from reliance 
on rating scales and towards 
using the language in which 
people naturally communicate. 


By creating population 
subgroups based on age and 
gender, found that most of the 
cross-cutting subgroups 
exhibited amplification of bias. 


ChatCounselor is able to 
generate interactive 
meaningful responses to 
provide personalized mental 
health support to users. 


and 


LLMs have a wide range of 
applications in clinical science 
and practice 
humans provide 
interventions, but 


that can help 
better 

will not 

completely replace therapists. 


AI can help peer supporters 
demonstrate higher levels of 
empathy. 


Despite the promise of AI in 
mental health, there are still 
many ethical and safety issues 
with current AI therapies. 


LLMs have great potential in the 
field of psychotherapy. 


LLMs have great potential in 
psychiatric research and 
practice, but there are concerns 
(e.g., reliability, 


transparency, accountability, 


accuracy, 


and ethical issues). 


4. LLMs in educational and developmental psychology 


Within multilevel time scales of human behavior (Newell, 1990), educational and developmental 
psychology is primarily positioned at the relatively medium- to long-term level (see Fig.1), which reflects 
the ongoing learning and development that characterizes the educational process. Educational and 
developmental psychology is concerned with the learning process, the accumulation of knowledge, the 
development of skills, and the changes in the individual psyche within the educational environment. 
According to a national survey, it was found that only three months after the public release of ChatGPT, 40% 
of U.S. teachers used it weekly for lesson planning (Demszky et al., 2023). 

Table 3 summarizes the applications of LLMs to educational and developmental psychology. The 
potential for using LLMs in educational and developmental psychology is manifold. They can facilitate 
personalized learning and play an important role in emotion recognition, mental health aids, educational 
assessment, and improving learning motivation. Specifically, LLMs learn from massive amounts of data 
from the Internet and books (Binz & Schulz, 2023b), can be used as more knowledgeable learning aids 
(Stojanov, 2023), provide personalized learning experiences (Kasneci et al., 2023), will enhance motivation 
to learn (Ali et al., 2023). For example, Stojanov (2023) explored the potential of ChatGPT as a learning tool 
using the following method: He began his learning journey by setting learning objectives and conversing 
with ChatGPT about its functionality over four hours. Over the next three hours, he continued the discussion 
with ChatGPT and watched some relevant videos on YouTube. He felt positive feedback from his interactions 
with ChatGPT and found it a motivating and relevant learning experience. 


Table 3 Applications of LLMs in educational and developmental psychology study. 


Author Research question Research method Key finding 

Frank (2023) LLMs show impressive Ensure LLMs have not been pre-exposed to the By drawing on approaches from 
capabilities, but it is not stimuli used in the experiment. Selecting simplified developmental psychology, 
clear what abstractions Stimuli. Need for evidence of convergence across researchers can better 

P : : multiple experimental tasks and developmental understand the representations 
underlie their behavior. 
processes. used by LLMs. 

Han (2023) LLMs can aid research in Using the Behavioral Definitional Issue Test (bDIT). | LLMs can serve as a useful tool 
ethics education and Examine ChatGPT's learning and reasoning in moral education and 
development, especially capabilities by requesting it to read and extract moral developmental research to 
involving empirical and information from the letter from Martin Luther predict the potential effects of 
practical inquiry. King. Using three stories of moral exemplars, educational inputs on students' 

analyze whether ChatGPT produces emotional and cognitive, motivational, and 
motivational responses when observing these behavioral processes. 
stories. 

Stojanov Explore the experience of Using an autobiographical research methodology, Instant answers help to create a 

(2023) using ChatGPT asa more this study explored the role of ChatGPT asa support 'flow' experience, but in a state 
knowledgeable other to for those with more knowledge in the learning of ‘immersion’, users may 
aid in the learning process, particularly in terms of understanding overestimate their knowledge 


process, as well as the 


ChatGPT's technology. 


and understanding. 


potential and limitations. 


Kasneci et al. Opportunities and LLMs can help create and design educational LLMs have great potential in 
(2023) challenges of LLMs in content, provide personalized learning experiences, education and can provide many 
education. aid in language learning, research, and writing, benefits to students and teachers 
conduct assessments and grading, and facilitate (e.g., personalizing instruction, 
professional development. increasing student engagement 
and interaction, and creating 
educational content for learners 
with diverse needs). 
Cai et al. Explore whether the 12 pre-registered psycholinguistic experiments: ChatGPT shares similarities 
(2023) ChatGPT is similar to Speech: sound-shape association. Speech: sound- with humans in language 
humans in language gender association. Words: word length and comprehension and production 
comprehension and predictivity. Words: word meaning priming. Syntax: but also shows different patterns 
production, and how it structural priming. Syntax: syntactic ambiguity in some areas. 
performs on multiple resolution. Meaning: implausible sentence 
psycholinguistic tasks. interpretation. Meaning: semantic illusions. 
Discourse: implicit causality. Discourse: drawing 
inferences. Interlocutor sensitivity: word meaning 
access. Interlocutor sensitivity: lexical retrieval. 
Interlocutor sensitivity: word meaning access. 
Ali et al. Explore the impact of A five-point Likert scale was used to collect | ChatGPT positively impacts the 
(2023) ChatGPT on English information about participants' perceptions of the motivation of English language 
language students' impact of the ChatGPT on learning English, as well learners. 
motivation from teachers' as whether or not the ChatGPT increased the 
and students' students' interest in English language learning, self- 
perspectives. directed learning, interacting with others, and the 
enjoyment and pleasure of learning English. 
Kosinski Explores whether LLMs The researchers used two types of gold-standard ToM, previously thought to be a 
(2023) spontaneously generate false-belief tasks: the Unexpected Contents task, uniquely human ability, may 


Theoretical Minds. 


also known as the Smarties task, and the Unexpected 
Transfer task, also known as the Maxi task or Sally- 
Anne test. 


have emerged spontaneously as 
a byproduct of LLMs' improved 
language skills. 


5. LLMs in social and cultural psychology 


In time scales of human behavior (Newell, 1990), social and cultural psychology covers a predominantly 
long-term dimension (see Fig.1), reflecting its focus not only on social interactions but also on the long-term 
behavioral patterns and mental processes of individuals in their social environments. Social and cultural 
psychology studies how individual behavior is influenced by the social and cultural environment and others 
and how individuals affect the social and cultural environment and others. These studies usually focus on 
interpersonal interactions (Tajfel, 1982), group behavior, attitude formation and change, and social cognition. 
LLMs can simulate human responses and behaviors and be used to test theories and hypotheses of human 
behavior (Grossmann et al., 2023). In social and cultural psychology, LLMs can revolutionize the field by 
analyzing large amounts of textual data, modeling social interactions, and providing valuable insights into 
human behavior and social dynamics (Salah et al., 2023). 

First, LLMs share many similarities with humans regarding social cognition. For example, research has 


found that LLMs have a variety of typical human cognitive biases in judgment and decision-making, such 


as the anchoring effect, the representativeness heuristic, and the base rate neglect (Talboy & Fuller, 2023). 
In addition, cultural psychology research has shown that there are significant differences in the cognitive 
processes of Easterners and Westerners when processing information and making judgments (Nisbett et al., 
2001) and that LLM consistently favors an Eastern holistic way of thinking (Jin et al., 2023). 

Second, LLMs have also been shown to characterize human groups in social interaction settings. For 
example, it has been shown that LLMs replicate the results of Milgram's electroshock experiments (Aher et 
al., 2023), show better gaming abilities in specific games (Akata et al., 2023), and exhibit different risk- 
taking and pro-social behaviors under different emotional states (Yukun et al., 2023). 

Next, LLMs can also serve as specific social and cultural psychological research samples. For example, 
one study explored the potential of LLMs to serve as valid proxies for specific human subgroups in social 
science research and found that LLMs contain information that goes far beyond superficial similarity, 
reflecting the complex interplay between ideas, attitudes, and sociocultural contexts that characterize human 
attitudes (Argyle et al., 2022). In addition, one study has tested LLM for personality and values, and the 
results show that their scores are all similar to those of human samples (Miotto et al., 2022). 

Therefore, LLMs have many applications in social and cultural psychology, allowing one to test theories 
and hypotheses about human behavior in social and cultural interaction settings. For example, one study 
explores whether an AI chatbot can adapt its financial decisions and pro-social behaviors through emotional 
cues as humans do (Yukun et al., 2023). The experimental design is divided into two parts: Study 1, 
investment decision-making, was chosen as the topic for this study because human investment decisions are 
susceptible to emotional cues. It is hypothesized that the investment risk-taking behavior of AI chatbots will 
be lower than that of the control group when they receive fear emotional cues and higher than that of the 
control group when they receive joy emotional cues. By providing the bots with different emotional cues 
(fear, joy, or no emotion), their responses in terms of investment decisions were collected and analyzed. 
Study 2 measured the pro-social responses exhibited by an AI chatbot by providing it with emotional cues 
of anxiety and joy by donating money to a sick friend. Like Study 1, whether emotional cues influenced its 
pro-social behavior was explored by collecting and analyzing the robot's responses under different emotional 
cues. Table 4 summarizes the applications of LLMs to social and cultural psychology. 


Table 4 Applications of LLMs in social and cultural psychology study. 


Author Research question Research method Key finding 


Atari et al. LLMs have made great Using World Values Survey (WVS) data: place LLMs performed as outliers 
(2023) strides in generating and LLMs on the spectrum of contemporary human on psychometrics compared 


Jin et al. 
(2023) 


Schaaff et 
al. (2023) 


Salah et al. 
(2023) 


Harding et 
al. (2023) 


Patel and 
Fan (2023) 


X. Wang et 
al. (2023) 


Hutson 
(2023) 


Grossmann 
et al. 
(2023) 


analyzing textual data, 
but how similar are they 
to different kinds of 
humans? 


Explore ChatGPT's way 
of thinking within the 
framework of cultural 
psychology, i.e., whether 
it tends to think 
holistically or 
analytically. 


The main research 
question of this thesis 
was to explore the 
empathic abilities of 
ChatGPT. 


Explore the use of 
generative AI (e.g., 
ChatGPT) in social 
psychology research, 
including its potential 
advantages and 
limitations. 


Can LLMs replace 
human research 
participants, especially 
in the field of moral 
psychology research? 


The ability of three 
current language models 
(Bard, GPT 3.5, and 
GPT 4) to recognize and 
describe emotions, as 
well as their levels of 
empathy. 


Emotional Intelligence 
(EI) in LLMs, including 
emotion recognition, 
interpretation, and 
comprehension. 


Explore whether 
LLMs(e.g., GPT-3.5) can 
replace human 
participants and their 
potential applications in 
some fields. 


How to revisit and 
improve social science 
research methods in the 
context of the ongoing 
impact of AI, especially 
LLMs, on social science 
research. 


psychological change. Standard Cognitive Tasks: 
through multiple standard cognitive tasks. Thinking 
styles: comparing GPT responses to extensive cross- 
cultural data from 31 human groups. Self-concept: 
using an established self-concept task. 


Two scales that directly measure cognitive 
processes: the Analytic-Holistic Scale (AHS) and 
the Ternary Categorization Task (TCT). In addition, 
two scales investigating differences in cultural 
thinking values were used: the Dialectical Self Scale 
(DSS) and the Self-Construction Scale (SCS). 


Understanding and expressing emotions: ChatGPT 
generates responses based on prompts and compares 
them to the expected emotion categories. Parallel 
Emotional Responses: analyzing the emotional 
responses generated by ChatGPT when prompted 
with different emotional categories. ChatGPT's 
empathic personality: five psychologically- 
recognized questionnaires (e.g., IRI, EQ, TEQ, PES, 
and AQ) were used for a system-level assessment of 
ChatGPT's level of empathy in different areas. 


Simulating social interactions: providing insights 
into human behavior and social phenomena. 
Analyzing large amounts of textual data: providing 
an in-depth understanding of human behavior and 
social interaction. Extracting insights about 
cognitive processes: providing an in-depth 
understanding of the role of cognitive processes in 
social behavior and social interaction. 


LLMs replace human participants in moral 
psychology: to help generate and refine research 
hypotheses, to pilot test items, and to validate data 
from human subjects. 


The model's ability to describe and recognize 
emotions was assessed using the TAS-20 (Toronto 
Alexithymia Scale). Models were assessed for their 
ability to empathize using the EQ-60 (Empathy 
Quotient). Comparison of model performance to 
human benchmarks to analyze the ability of these 
models in emotion comprehension and expression. 


Assessing Emotional Intelligence in LLMs by 
Developing a Standardized Emotional 
Understanding Test (SECEU): The test is based on 
scenarios designed to elicit positive and negative 
emotions based on school, family, and social 
contexts. Participants were asked to assign 4 
possible emotions to each scenario (e.g., surprise, 
joy, confusion, pride) a total rating of 10. 


Ethical judgment experiment: use GPT-3.5 to assess 
464 ethical scenarios that had been previously 
assessed by human participants. Consumer Behavior 
Experiment: use GPT-3.5 to simulate consumer 
behavior to test its ability to replace human 
participants in market research. Diversity of AI 
Participants: allow GPT-3 to exhibit different 
personality types by providing it with different 
character traits. Social Simulation Experiment: use 
GPT-3 to create a virtual social platform with 1,000 
different users called SimReddit. 


Potential applications of LLMs in a variety of social 
science research methods such as questionnaires, 
behavioral tests, mixed methods analysis of semi- 
structured responses, agent-based modeling (ABM), 
observational studies, and experiments. 


to large-scale cross-cultural 
data. 


In terms of cognitive 
processes, ChatGPT favors 
Eastern holistic thinking. 


However, in terms of value 
judgment, ChatGPT did not 
significantly favor the East or 
the West. 


ChatGPT can understand the 
emotions of others and take 
their perspective but still has 
some difficulty in showing 
higher levels of empathy 
compared to healthy humans. 


ChatGPT has great potential 
in social psychology research 
to help analyze large amounts 
of textual data, simulate social 
interactions as well as provide 
valuable insights into human 
behavior and social dynamics. 


Despite the potential of LLMs 
to simulate human behavior 
and thinking, it is unlikely 
that they can fully replace 
humans as participants in 
scientific research. 


Current LLMs have human- 


equivalent capabilities in 
sentiment recognition and 
description. 


LLMs varied widely in their 
performance on Emotional 
Intelligence (EI). 


LLMs can replace human 
participants to some extent for 
experiments in fields. 


LLMs can be used in place of 
human participants for data 
collection, and also can be 
used in agent-based modeling 


(ABM) to explore how 
individuals with specific 
characteristics and beliefs 


influence subsequent human 


Akata et al. 
(2023) 


Abramski 
et al. 
(2023) 


Yukun et al. 
(2023) 


Suri et al. 
(2023) 


Talboy and 
Fuller 
(2023) 


Park et al. 
(2022) 


X. Liet al. 
(2022) 


Sap et al. 
(2022) 


How LLMs interact with 
other LLMs and simple 
human-like strategies in 
repeated games, and how 
their behavioral 
characteristics are 
reflected in different 
types of games. 


Analyzing and revealing 
the biases of LLMs such 
as GPT-3, GPT-3.5, and 
GPT-4 in describing 
math and STEM 
disciplines. 


Whether LLMs can 
adjust their financial 
decisions and pro-social 
behaviors in response to 
emotional cues, and 
whether this ability 
increases with advances 
in language models. 


Whether LLMs exhibit 
decision-making 
heuristics and biases 
similar to humans, and 
whether LLMs influence 
human cognition and 
decision-making 
processes. 


How do human cognitive 
biases permeate the 
output of LLMs? 


How to generate a 
prototype with real 
social interaction to help 
designers understand and 
adjust to the challenges 
that can arise when 
social computing 
systems are filled at 
scale. 


To assess whether LLMs 
are safe on a 
psychological level and 
whether they exhibit 
features suggestive of 
mental illness. 


Whether LLMs have 
social intelligence and 
Theory of Mind (TOM), 
and how to improve 
these abilities. 


Getting LLMs to play limited repetition games: 
analyze their behavior when playing against other 
LLMs as well as simple human-like strategies. 
Analyzing behavior in different game families: 
Prisoner's Dilemma and War and Gender games. 


Construction of Behavioral Formal Mental 
Networks (BFMNs). Semantic frame analysis: 
analyze the semantic frames generated by LLMs to 
understand how they relate mathematics to other 
concepts. Examining different versions of LLMs: 
comparing GPT-3, GPT-3.5, and GPT-4 perceptions 
and biases in math and STEM domains. Compare 
with high school student data. 


Collecting and analyzing the responses of the robot 
in terms of investment decisions by providing it with 
different emotional cues (fear, joy, or no emotion). 


Measuring the pro-social responses exhibited by an 
AI chatbot through donations to a sick friend by 
providing it with emotional cues of anxiety and joy. 


Anchoring Heuristic: Create a low anchor and a high 
anchor prompt and ask ChatGPT to estimate the 
books’ numbers. Representativeness and 
Availability Heuristic: A structure similar to Linda's 
question was created to test whether ChatGPT would 
choose more typical options. Framing effects 
approach: two scenario prompts (a positive and a 
negative frame), ChatGPT was asked to rate the 
efficacy of the drug. End-beat effect: ask ChatGPT 
to choose one of two identical coins to donate to a 
museum. 


Representativeness heuristic: asking ChatGPT and 
Bard which career a person with a particular 
characteristic would be most likely to pursue. Base 
rate neglect and value selection bias: using a well- 
known base rate neglect prompt. Anchoring and 
adjustment: gave a random number and asked the 
model to indicate whether this number was higher or 
lower than the UN rate for African countries. 
Framing effects: using an established framing 
paradigm, a hypothetical disease and its two 
potential treatment options. 


Test whether social behaviors generated by a social 
simulator are credible across multiple never-before- 
seen communities. Generate 50 discussions from 
GPT-3 for a subreddit created after November 2020. 
Randomly select an actual discussion and a 
generated discussion, and have participants identify 
which one is real. 


Selection of LLMs: LLMs, GPT-3, InstructGPT and 
FLAN-T5-XXL. Psychological tests: Two 
personality tests (SD-3 and BFI) and two well-being 
tests (FS and SWLS) were used to assess the LLMs. 


Assessment Framework: Unbiased prompts were 
designed. For each prompt and statement, three 
outputs were sampled from the LLM and mean 
scores were calculated. 


The SOCIALIQA dataset was used to assess the 
social intelligence and general knowledge of LLMs. 
The TOMI dataset was used to test the ability of 
LLMs to reason about the mental states and reality 
of others. 


interactions. 


With appropriate cues, GPT-4 
can behave more forgivingly 
and coordinate better with 
other players. 


LLMs continue to evolve, it 
may be possible to produce 
less and less biased models, 
and may even help to reduce 
harmful stereotypes in society 
rather than continue to 
propagate them. 


ChatGPT-4 exhibits human- 
like coordinated responses in 
financial decision-making 
and pro-social behavior when 
confronted with emotional 
guidance. 


LLMs and associations may 
partially drive human 
decision-making heuristics, 
even in the absence of 
cognitive and affective 
processes. 


LLMs suffer from cognitive 
biases in their output. 


In the evaluation, participants 
were often unable to 
distinguish between social 
simulations and actual 
community behaviors. 


LLMs exhibit relatively dark 
personality traits in terms of 
psychological safety. 


LLMs still have limitations in 
terms of their performance on 
Social Intelligence and 
Theory of Mind (TOM). 
Increasing model size may 
not be the most effective way 
to realize AI systems with 


social intelligence and theory 
of mind. 


Miotto et Examine personality Personality (HEXACO scale) and values (Human The GPT-3 was similar to the 
al. (2022) traits, held values, and Values Scale) of GPT-3 were assessed using two human sample in terms of 
self-reported validated measurement instruments. Experiments personality traits but showed 
demographic were run with different temperature parameter different personality traits at 
characteristics of the settings to assess whether temperature affects GPT- different temperature settings. 
GPT-3 by assessing them 3's personality and values. Data were analyzed to 
with two validated report personality traits, values, and demographic 
measurement characteristics of GPT-3 at different temperatures 
instruments. and compared to a human baseline study. 
Argyle et Can LLMs serve as Use GPT-3 to generate a story context about The "algorithmic bias" in 
al. (2022) effective proxies for Democrats and Republicans for each person in the GPT-3 is fine-grained and 
modeling the survey, and then ask about the vocabulary newly demographically relevant, 
performance of specific sampled by GPT-3. Use GPT-3 to generate in silico which can enable it to 
human subgroups in samples for 2012, 2016, and 2020 ANES accurately model the 
social science research. participants that match their demographic distribution of responses from 
characteristics and compare them to the various human subgroups. 
corresponding human samples. Use GPT-3 to 
generate associations of complex patterns about a 
wide range of conceptual nodes to assess its 
algorithmic fidelity in complex structural 
associations. 
Trott et al. Can LLMs understand GPT-3 were tested for their performance ona written GPT-3 still performs less well 
(2023) other people's beliefs as version of the False Belief Task (FBT) to assess their than humans and does not 
well as humans do? sensitivity to a character's viewpoint state in a fully explain human behavior 
written passage. By comparing the performance of when dealing with false belief 
human participants and GPT-3 on this task, we to tasks. 
understand whether linguistic input is sufficient to 
account for the ability of humans to reason about 
others' mental states. 
Aher et al. How to simulate Ultimatum Game: to evaluate the accuracy of LLMs In the experiment (Wisdom of 
(2023) multiple human in simulating human behavior by generating random Crowds), LLMs exhibited an 
behaviors using LLMs completions. Garden Path Sentences (GPS): to "over-accuracy distortion", 
and to evaluate the assess the performance of models in which may affect downstream 
accuracy and consistency psycholinguistic research. Milgram Shock applications, such as 


of these models in 
simulating specific 
human behaviors. 


Experiment (MSE): to assess the performance of 
different LLMs in social psychology research. 
Wisdom of Crowds: to assess the performance of 
different LLMs in interdisciplinary research. 


education and the arts. 


6. LLMs as research tools in psychology 


The LLMs are a powerful tool for scientific research and can be used as a research aid to help 
psychologists with everything from literature review, hypothesis generation, experimental design, 


experimental subjects, and data analysis to academic writing and peer review (see Table 5) 


Table 5 LLMs as research tools in psychology study. 


Topic Related study 


LLMs can summarize the researched literature (Dis, Bollen, Zuidema, Rooij, & Bockting, 2023), 
complete literature review tasks (Qureshi et al., 2023), and create literature review articles (Aydin & 
Karaarslan, 2022), at the same time, there are LLM that has been specially trained to accomplish 
systematic literature reviews (Taylor et al., 2022). 


Literature review 


LLMs can generate hypotheses from scientific literature, make inferences based on scientific data, and 
then clarify their conclusions through interpretation (Zheng et al., 2023), and can quickly and 
automatically test these research hypotheses and learn from mistakes (Park et al., 2023). 


Hypothesis generation 


LLMs provide text-based material for experimental design, thereby optimizing the research process and 
reducing experimental complexity. By employing these models, researchers can easily create 
experimental stimuli, develop test items, and even simulate interactive sessions in controlled 


Experimental design 


environments (Aher, Arriaga, & Kalai, 2022; Akata et al., 2023), providing a high degree of control and 
precision to the experimental process. 


LLMs can simulate some human behaviors and responses, which provides an opportunity to test theories 
and hypotheses about human behavior (Grossmann et al., 2023), their use in place of human 
participation in experiments saves time and costs and can be applied to some experiments where human 
participation is not appropriate (Hutson, 2023), they can be combined with factors such as the specific 
research topic, the task, and the sample, and the use of LLM as an alternative to research participants 
where appropriate (Dillion et al., 2023). 


Experimental subjects 


LLMs can efficiently analyze massive amounts of textual data to gain insights into human behavior and 
emotions at an unprecedented scale (Patel & Fan, 2023), can analyze textual data in multiple languages, 
and accurately detect mental structures within it (Rathje et al., 2023), can draw mental profiles from 
social media data (Peters & Matz, 2023). 


Data analysis 


Academic writing LLMs can also help humans in writing (Dergaa et al., 2023; Stokel-Walker, 2022; Van Dis et al., 2023). 


LLMs were used in two natural language processing tasks and a human expert to assess the quality of 
the text, and the results of the assessment were consistent with those of the human expert (Chiang & 

Peer review Lee, 2023), LLMs offer the opportunity to get things done quickly, from Ph.D. students struggling to 
finish their dissertations, to peer reviewers submitting analyses under time pressure(Van Dis et al., 
2023). 


6.1. Automated literature review and meta-analysis 


Conducting a literature review meta-analysis is a complex, arduous process that requires significant 
expertise and time (Michelson & Reuter, 2019). Nature reports that researchers have used ChatGPT as a 
research assistant to summarize the literature of studies (Dis, Bollen, Zuidema, Rooij, & Bockting, 2023). In 
one study, researchers utilized ChatGPT to complete some systematic literature review tasks (Qureshi et al., 
2023). In another study, a literature review article was created using ChatGPT with the application of digital 
twins in the health field as the theme, and the results showed that knowledge compilation and representation 
were accelerated with the help of ChatGPT. However, academic validity needs to be further verified (Aydin 
& Karaarslan, 2022). Meanwhile, there are also LLMs specifically trained by researchers for the practical 
needs of scientific research (Taylor et al., 2022), which can accomplish a systematic literature review. 

In summary, LLMs speed up the process of literature review and meta-analysis. Researchers can use 
these models to systematically review and synthesize existing research, improving the efficiency of 


evidence-based psychology. 


6.2. Hypothesis generation and experimental design 


Hypothesis-driven research is at the core of scientific activity. LLMs can generate hypotheses from 
scientific literature, make inferences based on scientific data, and then clarify their conclusions through 
interpretation (Zheng et al., 2023). Although LLMs are capable of generating research hypotheses and 
becoming better "hypothesis machines," they will need to improve their logical and mathematical derivation 
capabilities in the future to eliminate more factual errors and be able to quickly and automatically test these 
research hypotheses to learn from their mistakes (Y. Park et al., 2023). As an innovative tool, LLMs have 


great potential for use in psychological experiments. They can provide text-based material for experimental 


design, optimizing the research process and reducing experimental complexity. By employing these models, 
researchers can easily create experimental stimuli, develop test items, and even simulate interactive sessions 
in controlled environments (Aher, Arriaga, & Kalai, 2022; Akata et al., 2023), providing a high degree of 
control and precision to the experimental process. 

In conclusion, LLMs provide a powerful and flexible tool for psychological research, from hypothesis 


generation to experimental design, to help researchers achieve more efficient and precise research goals. 


6.3. As subjects in a psychological experiment 


Although LLMs can simulate some human behaviors and responses, which provides an opportunity to 
test theories and hypotheses about human behavior (Grossmann et al.,2023), there is still some controversy 
as to whether LLMs can be used as a substitute for human subjects to participate in psychological research. 
Some researchers have argued that LLMs can be used in psychology as a substitute for human participation 
in experiments to save time and cost and can be applied to experiments that are not suitable for human 
participation while recognizing that these models can have some problems (e.g., bias and insufficiently 
trained data, etc.) (Hutson,2023). Some other researchers have proposed using LLMs as an alternative 
method of studying participants when appropriate, based on their performance in conjunction with factors 
such as specific research topics, tasks, and samples (Dillion et al., 2023). Some researchers believe that 
although LLMs will significantly impact scientific research, they are unlikely to replace human participants 
in any meaningful way (Harding et al., 2023). Although there is some controversy about whether LLMs can 
replace humans as experimental subjects, some studies of LLMs as subjects have shown that LLMs perform 
similarly to humans (Orru,Piarulli,Conversano,&Gemignani, 2023;PeterS.Park,Schoenegger,& Zhu, 2023), 
which may indicate the potential of LLMs to replace humans as subjects. 

In conclusion, although LLMs can simulate human judgments, their understanding of human thinking 
is still limited, and their output should be validated and interpreted with caution when chosen as 


psychological subjects. 


6.4. Tools for data analysis 


Various forms of AI have long been used to analyze psychological data, such as flight data for pilot 
screening (Ke et al., 2023). Machine learning algorithms have facilitated the processing of large datasets, 
identifying patterns and correlations that may have been overlooked. However, LLMs take this capability to 


a new level. These models can efficiently analyze massive amounts of textual data to gain insights into 


human behavior and emotions on an unprecedented scale (Patel & Fan, 2023). Psychological research means 
faster and more comprehensive data analysis, leading to more reliable and nuanced findings. LLMs can 
analyze textual data in multiple languages, accurately detect psychological structures within them (Rathje et 
al., 2023), and go for psychological profiles from social media data (Peters & Matz, 2023). In addition, 
LLMs have demonstrated a degree of competence in the medical field; LLMs can predict the optimal 
neuroradiographic imaging modality for a given clinical presentation, even though LLMs do not outperform 
experienced neuroradiologists, suggesting the need for continued improvement in the medical context 
(Nazario-Johnson et al., 2023). These findings demonstrate the great potential of LLMs in evaluating and 


analyzing data. 


6.5. Paper writing and peer review tools 


It has been argued that LLMs are not currently a complete replacement for human writing, but instead 
answer questions and generate naturally fluent and informative content compellingly, but with no real 
intelligence, just generated text based on patterns of previously seen words (Stokel-Walker, 2022). In one 
study, students used ChatGPT as an aid in their writing. However, the results showed that the experimental 
group that used ChatGPT was similar to the control group in terms of writing quality, speed, and authenticity, 
and the authors suggest that this may be because experienced researchers can better guide ChatGPT to high- 
quality information. In contrast, the students may be having ChatGPT difficulties (Bašić, Banovac, Kružić, 
& Jerković, 2023). In another article, the authors discuss the prospects and potential threats of ChatGPT in 
academic writing and emphasize that using ChatGPT in academic research should prioritize peer-reviewed 
scholarly sources. In addition, the article mentions the potential advantages of ChatGPT in academic research, 
including the handling of large amounts of textual data, automatic generation of abstracts, and research 
questions (Dergaa, Chamari, Zmijewski, & Saad, 2023). Additionally, LLMs have the potential for peer 
review (Van Dis et al., 2023), where the results of LLM’s evaluation in a text evaluation task are consistent 
with those of human experts (Chiang & Lee, 2023). 

In conclusion, LLMs such as ChatGPT are potent tools for academic writing, capable of processing 
large amounts of textual data and automating tasks that were previously done manually; it can be used to 
scan academic papers and extract essential details, generate objective and unbiased abstracts, and create 
research questions. It also has the potential to be applied to peer review of papers. However, researchers 


must exercise caution when using them as they can also integrate false or biased information into papers, 


leading to unintentional plagiarism and misattribution of concepts (Dis, Bollen, Zuidema, Rooij, & Bockting, 


2023). 


7. Challenges and future directions 


7.1. Challenges and limitations 


Although the potential of LLMs to simulate complex cognitive processes is enormous, providing 
researchers with new tools to explore the mechanisms of human cognition and behavior opportunities for a 
wide range of applications in a variety of fields, including clinical and counseling, educational and 
developmental, and social and cultural psychology. However, the output of LLMs should not be mistaken 
for the presence of thoughts but instead be viewed as complex pattern matching based on probabilistic 
modeling (Floridi & Chiriatti, 2020). Although its performance is impressive, this is different from the model 
being conscious or genuinely understanding. The interpretation of its capabilities must be based on 
understanding its limitations and the nature of its operation, which may differ fundamentally from human 
cognition. Therefore, it is essential to focus on the potential of LLMs in psychological research while at the 
same time facing up to the technical and ethical challenges that may arise. 

First, despite the emergence of competence in the LLM (Wei et al., 2022), its internal working 
mechanism remains a black box from a cognitive and behavioral psychology perspective. For example, 
LLMs perform impressively on tasks requiring formal linguistic competence (including knowledge of the 
rules and patterns of a particular language) but fail many tests requiring functional competence (the set of 
cognitive abilities needed to understand and use language in the real world) (Mahowald et al., 2023), excels 
in analogical reasoning and moral reasoning tasks, but performs poorly on spatial reasoning tasks (Agrawal, 
2023). 

Second, while LLMs have accelerated the use of AI technology in clinical and counseling 
psychotherapy, privacy and ethical issues may arise (Graber-Stiehl, 2023). For example, it has been shown 
that gatekeepers, patients, and even mental health professionals who rely on ChatGPT to assess suicide risk 
or use it as an aid to improve decision-making may receive inaccurate assessments that underestimate actual 
suicide risk (Elyoseph & Levkovich, 2023), and may also bias clinician decision-making, which can lead to 
healthcare inequity (Pal et al., 2023). In addition, LLMs in psychiatry research and practice have been 


associated with potential bias and privacy violations (Zhong et al., 2023). 


Third, in fields such as educational, developmental, and social and cultural psychology, LLMs face 
problems and challenges in their application. For example, when applied in education, LLMs have the 
potential for output bias and misuse (Kasneci et al., 2023). One study found that the texts generated by 
ChatGPT were not always consistent or logical and sometimes even contradictory (Stojanov, 2023). In the 
field of social and cultural psychology, LLMs exhibit cognitive biases (Talboy & Fuller, 2023) and cultural 
biases (Atari et al., 2023) similar to those of humans, in addition to implicitly darker personality patterns (X. 
Li et al., 2022). Field Bender et al. (2021) have argued that training data for LLMs may reflect social biases 
that continue to be perpetuated in research settings. 

Finally, as an aid to scientific research, LLMs have some limitations. For example, when it comes to 
writing, LLMs currently do not fully replace humans. Instead, they answer questions and generate naturally 
flowing and informative content compellingly, without real intelligence, only generating text based on 
previously seen word patterns (Stokel- Walker, 2022). Although macrolanguage models can simulate human 
judgments when used as experimental subjects, there are still limits to their understanding of human thought 
(Dillion et al., 2023). Field Van Dis et al. (2023) noted that LLMs may accelerate innovation, shorten 
publication times, and increase scientific diversity and equality. However, they may also reduce the quality 
and transparency of research and fundamentally alter scientists' autonomy as human researchers. 

In summary, while LLMs offer extraordinary capabilities for psychological research, they also present 
challenges related to bias, ethical issues, data security, transparency, and technical expertise. Researchers 
should be fully aware of these challenges when using big language models and take steps to address them 
responsibly in their research projects. The following table summarizes the challenges and limitations of 


LLMs in psychological applications (see Table 6). 


Table 6 Challenges and limitations of LLMs in psychological applications. 


Author Domain Challenges and limitations 


Mitchell (2023) Cognition and Behavior Lack of real-world understanding. Lack of abstract reasoning. Lack of understanding of 
user intent. 


Stella et al. Cognition and Behavior Lack of meta-knowledge leads to some limitations of LLMs in processing information). 

(2023) Lack of curiosity and which raises questions about the source of the "creativity" they 
exhibit). Hallucinations: LLMs unconsciously fabricate information and are unable to 
identify the source of their knowledge. 


Sartori and Orrù Cognition and Behavior Lack of causal reasoning ability: they may not perform well in causal reasoning. 


(2023) Dependence on training data: if the training data is biased, the model may not perform well 


in other tasks. Lack of creativity and imagination. 


Goertzel (2023) Cognition and Behavior Lack of autonomy: LLMs are unable to systematically pursue complex goals. Lack of 
abstract reasoning: LLMs perform poorly in performing highly complex multi-step 
reasoning. Lack of self-understanding: LLMs are unable to reflect fully on their behavior 
and limitations. Lack of in-depth understanding of the real world: Leading to potential 
problems when they perform tasks involving the real world. 


Peng et al. 
(2023) 


Holtzman et al. 
(2023) 


Seals and Shalin 
(2023) 


Stade et al. 
(2023) 


Li et al. (2023) 


Kasneci et al. 
(2023) 


Fecher et al. 
(2023) 


Atari et al. 
(2023) 


Park et al. 
(2023) 


Salah et al. 
(2023) 


Hayes (2023) 


Miotto et al. 
(2022) 


Bender et al. 
(2021) 


Tamkin et al. 
(2021) 


Brown et al. 
(2020) 


Cognition and Behavior 


Cognition and Behavior 


Cognition and Behavior 


Clinic and Counseling 


Education and 
Development 


Education and 
Development 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Society and Culture 


Forgetting problem: LLMs may forget previously learned knowledge when learning new 
tasks. Inadequate common-sense reasoning: LLMs perform poorly on common sense 
reasoning tasks. Lack of systematic demonstration of problem-solving skills: studies have 
found that LLMs occasionally perform poorly when solving problems. 


Lack of clear understanding of model behavior: making it difficult to improve model 
performance and solve problems. Lack of formal description of model behavior: the lack 
of formal description of model behavior makes it difficult for researchers to systematically 
analyze model behavior and thus find a unified theory to explain model behavior. Lack of 
interpretability of model behavior: it difficult for researchers to understand how models 
can perform well on some tasks and poorly on others. 


ChatGPT and human-generated analogies differed in these stylistic dimensions, these 
lexical features, their choice of words for these features and these devices that help readers 
understand text. ChatGPT may lack human cognitive and psycholinguistic features when 
generating analogies. 


Technical limitations: may have difficulty in assessing patients for suicide risk, substance 
abuse, safety issues, medical comorbidities, and life events). Connecting with the patient: 
may have difficulty interpreting nonverbal behaviors. Problems with full autonomy and 
therapeutic relationship (e.g., altering the patient's existing relationships or social skills). 


Academic integrity and the definition of authorship. Assessment methods and educational 
consequences. Data privacy and security. Teacher-student relationship. Students’ critical 
thinking skills. Misinformation and bias. Interpersonal communication skills development. 


Technical limitations: leading to insufficient personalization and adaptation. Bias and 
equity: affecting teaching and learning processes and outcomes. Over-reliance on models: 
leading to a decline in creativity, critical thinking, and problem-solving skills. 


Inadequate knowledge and expertise: Many educators and institutions may lack the 
knowledge and expertise to effectively integrate new technologies. Maintenance costs. 


Multilingual support and equitable access. 


Liability issues: challenging traditional mechanisms of authorship and liability. Bias 
issues: affecting the objectivity and impartiality of science. Privacy and data protection 
issues: may be privacy issues with the training data of LLMs. Intellectual property issues: 
potential legal disputes. Environmental issues: generating large amounts of carbon 
emissions, which can have a negative impact on the environment. 


Ignoring global psychological diversity (e.g., tend to favor the psychological 
characteristics of WEIRD societies) and which can lead to prejudice and discrimination 
against people of other cultures and backgrounds. Differences in values and moral 
judgments and which can lead to problems of communication and understanding in 
multicultural societies. Self-identity and perceived social roles and which may lead to 
stereotypes and misconceptions about non-WEIRD populations). 


Reduced innovation and development, bias and discrimination, culture clash and conflict, 
differences in values and morals and entrenchment of the status quo. 


Limited understanding of social context: Although ChatGPT performs well in syntax and 
general semantics, it still has limitations in capturing the nuances of social language. 
Ethical challenges: Al-generated fake content can lead to ethical issues including digital 
personhood, informed consent, potential manipulation, and the implications of using AI to 
simulate human interactions. 


Potential biases: if the training data contain biases, LLMs may learn and replicate them. 
Data privacy and consent issues: Text generated using LLMs may involve data privacy and 
consent issues. Output may be non-humanly understandable: although LLMs generate text 
that closely resembles human language, they do not truly understand the content and may 
generate absurd or misleading responses. 


Bias and discrimination: LLMs may be affected by biases in the training data, which can 
produce unfair results, such as reinforcing sexism in the translation of job advertisements. 
Responsibility and control: Due to the complexity of language models, it is difficult to 
determine who is responsible for the model's output, which can lead to attribution of 
problems and lack of controls. 


Potential Harm: LLMs may lead to the propagation of harmful ideas such as stereotyping, 
discrimination, and extremism, and may lead to misinformation and bullying when 
generating text. Data bias and unfairness: leading to potential harm to marginalized 
communities. Automating bias: exacerbating existing biases and discrimination. 
Enhancement of authoritative viewpoints: LLMs may reinforce dominant viewpoints in 
the training data, further undermining marginalized people. 


Alignment: In order to better align models with human values, algorithmic improvements 
are needed to increase factual accuracy and robustness against adversarial samples. In 
addition, appropriate values need to be made explicit for different usage scenarios. Societal 
Impact: Widespread use of LLMs may lead to problems such as information leakage and 
amplification of bias. 


Misuse of language modeling: GPT-3 may be used to generate fake news, spread extremist 
ideas, conduct cyber-attacks and other malicious uses. Fairness, bias, and representation: 
GPT-3 may carry bias against gender, race, and religion, among others, sparking related 
controversies. News generation: News generated by GPT-3 may be difficult to distinguish 


from real news, leading to confusing and misleading information. 


Sallam (2023) Research tools Plagiarism: content generated by ChatGPT may be considered plagiarized, violating 
academic norms. Copyright issues: Is the generated content owned by ChatGPT or by the 
user? Transparency issues: The workings of ChatGPT may not be transparent, making it 
difficult for users to understand the source of generated content. Liability issues: who is 
responsible for ChatGPT when generating incorrect content? 


Gupta et al. Research tools Transparency and Explanation: The working mechanism of generative AI models may be 
(2023) difficult to explain, which may lead users to doubt the credibility of the generated content. 
Legal and Ethical Issues: Generative AI models may involve intellectual property, privacy, 
and ethical issues, requiring attention to compliance with relevant laws and regulations 


during use. 
Dergaa et al. Research tools Integration of erroneous or biased information. Problems with citing original sources and 
(2023) authors. Impact on academic integrity and quality. Increased inequity and inequality: 


Difficulty in recognizing Al-generated content. Academic evaluation and recognition 
issues. Direct replacement for academic researchers: ChatGPT is not a complete 
replacement for academic researchers as it has limitations in certain types of academic 


research. 
Peters and Matz Research tools User privacy: LLMs can infer psychological traits from a user's social media data, which 
(2023) may violate the user's privacy. Potential bias: LLMs may create potential bias in the 


inference process, which may lead to unfair treatment of specific groups (e.g., gender, age, 
etc.). Data security: if the inferential power of LLMs is used maliciously, it may lead to 
data leakage, with serious implications for users' mental health. 


Y. Liu et al. Research tools Academic misconduct: ChatGPT may be used for academic cheating, such as generating 

(2023) false papers or assignments. Challenges in the medical field: ChatGPT has limitations in 
medical image analysis, which may lead to wrong diagnosis and jeopardize patients! 
health. 


7.2. Future directions and emergent trends 


The LLMs have begun to be used in different areas of psychology, especially in cognitive and 
behavioral, clinical and counseling, educational and developmental, and social and cultural psychology. As 
the capabilities of LLMs are further enhanced, their applications in psychology still have the potential to 
continue to develop in the future. 

First, in the field of cognitive and behavioral psychology, with the emergence of multimodal LLMs 
(OpenAT, 2023), on the one hand, it is possible to combine visual and auditory information with textual data 
to understand better and model emotions, behaviors, and mental states for cognition, on the other hand, it is 
possible to use neuroimaging data to inform the architectures and parameters of LLMs, and to integrate this 
information with traditional textual data integration to create more accurate and biologically sound models 
of human language and thought. 

Second, in the field of clinical and counseling psychology, on the one hand, personal data such as social 
media posts, medical records, or wearable device data can be used to create tailored and personalized LLMs 
that provide more accurate and relevant insights into an individual's state of mind, on the other hand, the 
strengths of combining the clinical and counseling expertise of the human psyche with the scalability and 
computational power of an LLM can be combined to create new diagnostic treatment and intervention tools. 


In addition, in the fields of educational and developmental, and social and cultural psychology, it is essential 


to build ethical LLMs and to ensure that they are designed and deployed in a way that respects privacy and 
uses data fairly and responsibly. 

Ultimately, LLMs are a systematic project whose future development cannot be achieved without the 
interdisciplinary collaboration of researchers in fields as diverse as psychology, computer science, and 
linguistics, and for psychology researchers, an accessible open-source large language modeling framework 
and tools may be an integral part of future research efforts. The following table summarizes LLMs' future 


directions and emergent trends in psychological applications (see Table 7). 


Table 7 Future directions and emergent trends of LLMs in psychological applications. 


Author 
D'Oria (2023) 


Crockett and 


Domain 


Cognition and Behavior 


Cognition and Behavior 


Future directions and emergent trends 


Delving into Human-Computer Interaction (HCI) to understand Al's ability to mimic 
human behavior. Exploring how AI language modeling can be applied in the human 
sciences to improve research efficiency and quality 


Focus on the costs of adopting alternative human narratives in cognitive science research, 


Messeri (2023) such as masking the human labor behind them and the impact on human well-being. 
Concern about the impact of technological developments on scientific work and human 
understanding to ensure that cognitive scientists remain proactive in technological 
advances. 

Binz and Schulz Cognition and Behavior Explore ways to make LLMs more stable and robust in the face of descriptive tasks. 

(20220) Investigate whether LLMs can learn to explore purposefully and how to better utilize 
causal knowledge in tasks. Analyze the performance of LLMs in different tasks and 
contexts to see if they can adapt like humans. Explore how LLMs develop and refine their 
cognitive abilities during natural interactions with humans. 

Huang and Cognition and Behavior Improve the reasoning ability of LLMs to encourage reasoning by optimizing training data, 

Chang (2022) model architecture, and optimization goals. Develop more appropriate evaluation methods 


Abd-Alrazaq et 
al. (2019) 


Stade et al. 
(2023) 


Demszky et al. 
(2023) 


Hagendorff 
(2023) 


Sap et al. (2022) 


Argyle et al. 
(2022) 


Clinic and Counseling 


Clinic and Counseling 


Clinic and Counseling 


Education and 
Development 


Society and Culture 


Society and Culture 


and benchmarks to measure the reasoning ability of LLMs to better reflect the true 
reasoning ability of the models. Investigate the potential of LLMs in different applications 
(e.g., problem solving, decision making and planning tasks). Explore other forms of 
reasoning (e.g., inductive and retrospective reasoning). 


Develop more chatbots for people with mental illness, especially for those with disorders 
such as schizophrenia, obsessive-compulsive disorder and bipolar disorder. 


Implement more chatbots in developing countries to address the shortage of mental health 
professionals. Conduct more randomized controlled trials to evaluate the effectiveness of 
chatbots in mental health. 


Developing new therapeutic techniques and evidence-based practices (EBPs). Focus on 
evidence-based practices first: to create meaningful clinical impact in the short term, 
clinical LLM applications based on existing evidence-based psychotherapies and 
techniques will have the greatest chance of success. Involve interdisciplinary 
collaboration. Focuses on therapist and patient trust and usability. Criteria for designing 
effective clinical LLMs. 


Development of high-quality cornerstone datasets: these datasets need to encompass 
populations and psychological constructs of interest and be associated with 
psychologically important outcomes (e.g., actual behaviors, mindfulness, health, and 
mental well-being). Focus on future research directions in consumer neuroscience and 
clinical neuroscience: research in these areas may involve the neural systems of marketing- 
related behaviors, decision neuroscience, neuroeconomics, and more. 


Developmental psychology: examining how LLMs develop cognitively, socially, and 
emotionally over the lifespan and how these models can be optimized for specific tasks 
and situations. Learning psychology: studying how LLMs acquire and retain knowledge 
and skills, and how to optimize these models to improve learning. 


Explore more interactive and empirical training methods to help LLMs acquire true social 
intelligence and theoretical mental abilities. Investigate ways to combine static text with 
rich social intelligence and interaction data to improve social intelligence in LLMs. 
Investigate the theoretical-psychological abilities of LLMs in more naturalistic settings to 
reveal their performance in real-world scenarios. 


Investigate the algorithmic fidelity of the GPT-3 model and how appropriate conditioning 
can allow the model to accurately simulate the response distributions of various human 


subgroups. Created "in silico samples" by conditioning on the socio-demographic 
backgrounds of real human participants in multiple large U.S. surveys. 


Schaaff et al. Society and Culture Developing more advanced models: to more accurately capture the emotional context of 

(2023) conversations and improve emotional understanding and expression. Measuring the 
emotional capabilities of bots: to investigate how to assess the emotional capabilities of 
chatbots in order to better understand how they behave when interacting with humans. 
Explore the use of ChatGPT as a support tool: investigate how ChatGPT can be used to 
support people more empathetically and improve human well-being. 


Ziems et al. Society and Culture Cross-cultural CSS research: future research should separately consider the utility of 

(2023) LLMs for cross-cultural CSS in order to better serve social science research in different 
cultural contexts. Future research could explore contrastive or causal explanations in 
LLMs. New paradigms for social science and AI collaboration. 


Van Dis et al. Research tools Invest in truly open LLMs: develop and implement open-source AI technologies to 

(2023) increase transparency and democratic control. Embrace the advantages of AI: utilize AI to 
accelerate innovation and breakthroughs at all academic stages, while focusing on issues 
of ethics and human autonomy. Broaden the discussion: organize international forums to 
discuss the development and responsible use of LLMs in research, including issues of 
diversity and inequality. 


Fecher et al. Research tools Analyzing the risks and opportunities of LLMs for science systems. Examining how LLMs 

(2023) affect academic quality assurance mechanisms, academic misconduct, and scientific 
integrity. Exploring the impact of LLMs on academic reputation, evaluation systems, and 
knowledge dissemination. Examining how to balance the potential benefits from LLMs 
with adherence to scientific principles. 


8. Conclusion 


With the rapid development of AI technology, especially the continuous advancement of LLMs such as 
the GPT family, we have entered a new era characterized by an unprecedented level of machines able to 
understand and generate human language. This development is not just a technological breakthrough for the 
field of psychology but opens the door to a range of potential applications. 

First, in the field of cognitive and behavioral psychology, LLMs are excelling in a variety of cognitive 
tasks. Although there are still limitations in causal cognition and planning, these models resurrect the 
principle of association, demonstrating the ability to associate at a distance and reason in complex ways. At 
the same time, the ability to adapt LLMs to cognitive models is a significant strength of psychological 
research, allowing new explorations of human cognitive and behavioral processing mechanisms. 

Second, in clinical and counseling psychology, LLMs can be used as a preliminary diagnostic tool for 
mental health. While traditional mental health diagnosis relies on the experience of professionals and direct 
interaction with patients, LLMs can quickly identify potential mental health problems, such as depression 
and anxiety, by analyzing an individual's verbal expressions and textual content. Of course, such a diagnosis 
cannot wholly replace a professional psychological assessment. However, it can serve as an effective adjunct 
to help psychologists understand a patient's condition more quickly or play a role in primary mental health 


interventions. Meanwhile, personalized psychological intervention is another critical application direction 


of the LLM. By combining information about an individual's health data and lifestyle habits, these models 
can provide tailored psychological advice and intervention programs. This personalized approach may be 
crucial for improving the effectiveness of psychological interventions. 

Third, in educational and developmental and social and cultural psychology, LLMs have the same 
potential for application. For example, these models provide interactive and personalized learning 
experiences or generate research tasks based on real-life case applications that increase motivation and 
enhance learning. In addition, by analyzing large amounts of social media data, these models can help 
researchers track and analyze public sentiment changes to understand psychosocial dynamics better. 

Finally, in psychological research, LLMs can drastically improve research efficiency. Researchers can 
use these models to quickly organize and analyze large amounts of literature, thus saving time. In addition, 
these models can also assist in experimental design, data analysis, and even writing papers, making 
psychological research more efficient and precise. 

In summary, the applications of LLMs in psychology are promising. From research aids to cognitive 
modeling, from individualized interventions to personalized learning, and from the cognitive abilities of 
individuals to the social interactions of groups, these models have the potential to dramatically improve the 
understanding of the patterns of human communication, thought processes, and behaviors that lead to the 
development of more sophisticated theories of mind. However, despite the great potential for applying LLMs 
in psychology, being wary of the risks and challenges involved is essential. Ensuring these applications 
adhere to ethical standards is vital, especially in protecting individual privacy and data security. It is also 
important to realize that no matter how technologically advanced, LLMs can only partially replace the 
judgment and experience of human professionals. Therefore, these models should be viewed as an aid rather 


than an all-in-one solution. 
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